Nucleic acid binding proteins

ABSTRACT

Disclosed herein are methods of designing zinc finger binding polypeptides for binding to particular target sequences comprising overlapping nucleotide quadruplets.

This application is the national phase of international applicationPCT/GB98/01516 filed May 26, 1998 which designated the U.S.

The present invention relates to nucleic acid binding proteins. Inparticular, the invention relates to a method for designing a proteinwhich is capable of binding to any predefined nucleic acid sequence.

Protein-nucleic acid recognition is a commonplace phenomenon which iscentral to a large number of biomolecular control mechanisms whichregulate the functioning of eukaryotic and prokaryotic cells. Forinstance, protein-DNA interactions form the basis of the regulation ofgene expression and are thus one of the subjects most widely studied bymolecular biologists.

A wealth of biochemical and structural information explains the detailsof protein-DNA recognition in numerous instances, to the extent thatgeneral principles of recognition have emerged. Many DNA-bindingproteins contain independently folded domains for the recognition ofDNA, and these domains in turn belong to a large number of structuralfamilies, such as the leucine zipper, the “helix-turn-helix” and zincfinger families.

Despite the great variety of structural domains, the specificity of theinteractions observed to date between protein and DNA most often derivesfrom the complementarity of the surfaces or a protein α-helix and themajor groove of DNA [Klug. (1993) Gene 135:83-92]. In light of therecurring physical interaction of α-helix and major groove, thetantalising possibility arises that the contacts between particularamino acids and DNA bases could be described by a simple set of rules;in effect a stereochemical recognition code which relates proteinprimary structure to binding-site sequence preference.

It is clear, however, that no code will be found which can describe DNArecognition by all DNA-binding proteins. The structures of numerouscomplexes show significant differences in the way that the recognitionα-helices of DNA-binding proteins from different structural familiesinteract with the major groove of DNA, thus precluding similarities inpatterns of recognition. The majority of known DNA-binding motifs arenot particularly versatile, and any codes which might emerge wouldlikely describe binding to a very few related DNA sequences.

Even within each family of DNA-binding proteins, moreover, it hashitherto appeared that the deciphering of a code would be elusive. Dueto the complexity of the protein-DNA interaction, there does not appearto be a simple “alphabetic” equivalence between the primary structuresof protein and nucleic acid which specifies a direct amino acid to baserelationship.

International patent application WO 96/06166 addresses this issue andpro%ides a “syllabic” code which explains protein-DNA interactions forzinc finger nucleic acid binding proteins. A syllabic code is a codewhich relies on more than one feature of the binding protein to specifybinding to a particular base, the features being combinable in the formsof “syllables”, or complex instructions, to define each specificcontact.

However, this code is incomplete, providing no specific instructionspermitting the specific selection of nucleotides other than G in the 5′position of each quadruplet. The method relies on randomisation andsubsequent selection in order to generate nucleic acid binding proteinsfor other specificities. Moreover, this document reports that zincfingers bind to a nucleic acid triplet or multiples thereof. We have nowdetermined that zinc finger binding sites are determined by overlapping4 bp subsites, and that sequence-specificity at the boundary betweensubsites arises from synergy between adjacent fingers. This hasimportant implications for the design and selection of zinc fingers withnovel DNA binding specificities.

The present invention provides a more complete code which permits theselection of any nucleic acid sequence as the target sequence, and thedesign of a specific nucleic acid-binding protein which will bindthereto. Moreover, the invention provides a method by which a zincfinger protein specific for any given nucleic acid sequence may bedesigned and optimised. The present invention therefore concerns arecognition code which has beer elucidated for the interactions ofclassical zinc fingers with nucleic acid. In this case a pattern ofrules is provided which covers binding to all nucleic acid sequences.

According to a first aspect of the present invention, therefore, weprovide a method for preparing a nucleic acid binding protein of theCys2-His2 zinc finger class capable of binding to a nucleic acidquadruplet in a target nucleic acid sequence, wherein binding to base 4of the quadruplet by an α-helical zinc finger nucleic acid binding motifin the protein is determined as follows:

-   a) if base 4 in the quadruplet is A, then position +6 in the α-helix    is Gln and ++2 is not Asp;-   b) if base 4 in the quadruplet is C, then position +6 in the α-helix    may be any residue, as long as position ++2 in the α-helix is not    Asp.

Preferably, binding to base 4 of the quadruplet by an α-helical zincfinger nucleic acid binding motif in the protein is additionallydetermined as follows:

-   c) if base 4 in the quadruplet is G, then position +6 in the α-helix    is Arg; or position +6 is Ser or Thr and position ++2 is Asp;-   d) if base 4 in the quadruplet is T, then position +6 in the α-helix    is Ser or Thr and position ++2 is Asp.

The quadruplets specified in the present invention are overlapping, suchthat, when read 3′ to 5′ on the strand of the nucleic acid, base 4 ofthe first quadruplet is base 1 of the second, and so on. Accordingly, inthe present application, the bases of each quadruplet are referred bynumber, from 1 to 4, being, the 3′ base and 4 being the 5′ base.

All of the nucleic acid-binding residue positions of zinc fingers, asreferred to herein, are numbered from the first residue in the α-helixof the finger, ranging from +1 to +9. “−1” refers to the residue in theframework structure immediately preceding the α-helix in a Cys2-His2zinc finger polypeptide.

Residues referred to as “++2” are residues present in an adjacent(C-terminal) finger. They reflect the synergistic cooperation betweenposition +2 on base 1 and position +6 of the preceding (N-terminal)finger on base 4 of the preceding (3′) quadruplet which is the same basedue to the overlap. Where there is no C-terminal adjacent finger, “++”interactions do not operate.

Cys2-His2 zinc finger binding proteins, as is well known in the art,bind to target nucleic acid sequences via α-helical zinc metal atomcoordinated binding motifs known as zinc finders. Each zinc finger in azinc finger nucleic acid binding protein is responsible for determiningbinding to a nucleic acid quadruplet in a nucleic acid binding sequence.Preferably, there are 2 or more zinc fingers, for example 2, 3, 4, 5 or6 zinc fingers, in each binding protein. Advantageously, there are 3zinc finders in each zinc finger binding protein.

The method of the present invention allows the production of what areessentially artificial nucleic acid binding proteins. In these proteins,artificial analogues of amino acids may be used, to impart the proteinswith desired properties or for other reasons. Thus, the term “aminoacid”, particularly in the context where “any amino acid” is referredto, means any sort of natural or artificial amino acid or amino acidanalogue that may be employed in protein construction according tomethods known in the art. Moreover, any specific amino acid referred toherein may be replaced by a functional analogue thereof, particularly anartificial functional analogue. The nomenclature used herein thereforespecifically comprises within its scope functional analogues of thedefined amino acids.

The α-helix of a zinc finger binding protein aliens antiparallel to thenucleic acid strand, such that the primary nucleic acid sequence isarranged 3′ to 5′ in order to correspond with the N terminal toC-terminal sequence of the zinc finger. Since nucleic acid sequences areconventionally written 5′ to 3′, and amino acid sequences N-terminus toC-terminus, the result is that when a nucleic acid sequence and a zincfinger protein are aligned according to convention, the primaryinteraction of the zinc finger is with the—strand of the nucleic acid,since it is this strand which is aligned 3′ to 5′. These conventions arefollowed in the nomenclature used herein. It should be noted, however,that in nature certain fingers, such as finger 4 of the protein GLI,bind to the + strand of nucleic acid: see Suzuki et al., (1994) NAR22:3397-3405 and Pavletich and Pabo, (1993) Science 261:1701-1707. Theincorporation of such fingers into nucleic acid binding moleculesaccording to the invention is envisaged.

The invention provides a solution to a problem hitherto unaddressed inthe art, by permitting the rational design of polypeptides which willbind nucleic acid quadruplets whose 5′ residue is other than G. Inparticular, the invention provides for the first time a solution for thedesign of polypeptides for binding quadruplets containing 5′ A or C.

Position +6 in the α-helix is generally responsible for the interactionwith the base 4 of a given quadruplet in the target. According to thepresent invention, an A at base 4 interacts with a Glutamine (Gln or Q)at position +6, while a C at base 4 will interact with any amino acidprovided that position ++2 is not Aspartic acid (Asp or D).

The present invention concerns a method for preparing nucleic acidbinding proteins which are capable of binding nucleic acid. Thus, whilstthe solutions provided by the invention will result in a functionalnucleic acid binding molecule, it is possible that naturally-occurringzinc finger nucleic acid binding molecules may not follow some or all ofthe rules provided herein. This does not matter, because the aim of theinvention is to permit the design of the nucleic acid binding moleculeson the basis of nucleic acid sequence, and not the converse. This is whythe rules, in certain instances, provide for a number of possibilitiesfor any given residue. In other instances, alternative residues to thosegiven may be possible. The present invention, thus, does not seek toprovide every solution for the design of a binding protein for a giventarget nucleic acid. It does, however, provide for the first time acomplete solution allowing a functional nucleic acid binding protein tobe constructed for any given nucleic acid quadruplet.

In a preferred aspect, therefore, the invention provides a method forpreparing a nucleic acid binding protein of the Cys2-His2 zinc fingerclass capable of binding to a nucleic acid quadruplet in a targetnucleic acid sequence, wherein binding to each base of the quadruplet byan α-helical zinc finger nucleic acid binding motif in the protein isdetermined as follows:

-   a) if base 4 in the quadruplet is G, then position +6 in the α-helix    is Arg; or position +6 is Ser or Thr and position ++2 is Asp;-   b) if base 4 in the quadruplet is A, then position +6 in the α-helix    is Gln and ++2 is not Asp;-   c) if base 4 in the quadruplet is T, then position +6 in the α-helix    is Ser or Thr and position ++2 is Asp;-   d) if base 4 in the quadruplet is C, then position +6 in the α-helix    may be any amino acid, provided that position ++2 in the α-helix is    not Asp;-   e) if base 3 in the quadruplet is G, then position +3 in the α-helix    is His;-   f) if base 3 in the quadruplet is A, then position +3 in the α-helix    is Asn;-   g) if base 3 in the quadruplet is T, then position +3 in the α-helix    is Ala, Ser or Val; provided that if it is Ala, then one of the    residues at −1 or +6 is a small residue;-   h) if base 3 in the quadruplet is C, then position +3 in the α-helix    is Ser, Asp, Glu, Leu, Thr or Val;-   i) if base 2 in the quadruplet is G, then position −1 in the α-helix    is Arg;-   j) if base 2 in the quadruplet,is A, then position −1 in the α-helix    is Gln;-   k) if base 2 in the quadruplet is T, then position −1 in the α-helix    is Asn or Gln;-   l) if base 2 in the quadruplet is C, then position −1 in the α-helix    is Asp;-   m) if base 1 in the quadruplet is G, then position +2 is Asp;-   n) if base 1 in the quadruplet is A, then position +2 is not Asp;-   o) if base 1 in the quadruplet is C, then position +2 is not Asp;-   p) if base 1 in the quadruplet is T, then position +2 is Ser or Thr.

The foregoing represents a set of rules which permits the design of azinc finger binding protein specific for any given nucleic acidsequence. A novel finding related thereto is that position +2 in thehelix is responsbile for determining, the binding to base 1 of thequadruplet. In doing so, it cooperates synergistically with position +6,which determines binding at base 4 in the quadruplet, bases 1 and 4being overlapping in adjacent quadruplets.

A zinc finger binding motif is a structure well known to those in theart and defined in, for example, Miller et al., (1985) EMBO J.4:1609-1614: Berg (1988) PNAS (USA) 85:99-102: Lee et al., (1989)Science 245:635-637; see International patent applications WO 96/06166and WO 96/32475. corresponding to U.S. Ser. No. 08/422,107, incorporatedherein by reference.

As used herein, “nucleic acid” refers to both RNA and DNA, constructedfrom natural nucleic acid bases or synthetic bases, or mixtures thereof.Preferably, however, the binding proteins of the invention are DNAbinding proteins.

In general, a preferred zinc finger framework has the structure:X₀₋₂CX₁₋₅CX₉₋₁₄HX₃₋₆H/C  (A)where X is any amino acid, and the numbers in subscript indicate thepossible numbers of residues represented by X.

In a preferred aspect of the present invention, zinc finger nucleic acidbinding motifs may be represented as motifs having the following primarystructure: (SEQ ID NO: 3)X^(a) C X₂₋₄ C X₂₋₃ F X^(c) X X X X L X X H X X X^(b) H-linker −1 1 2 34 5 6 7 8 9  (B)wherein X (including X^(a), X^(b) and X^(c)) is any amino acid. X₂₋₄ andX₂₋₃ refer to the presence of 2 or 4, or 2 or 3, amino acids,respectively. The Cys and His residues, which together co-ordinate thezinc metal atom, are marked in bold text and are usually invariant, asis the Leu residue at position +4 in the α-helix.

Modifications to this representation may occur or be effected withoutnecessarily abolishing zinc finger function, by insertion, mutation ordeletion of amino acids. For example it is known that the second Hisresidue may be replaced by Cys (Krizek et al., (1991) J. Am. Chem. Soc.113:4518-4523) and that Leu at +4 can in some circumstances be replacedwith Ara. The Phe residue before X_(c) may be replaced by any aromaticother than Trp. Moreover, experiments have shown that departure from thepreferred structure and residue assignments for the zinc finger aretolerated and may even prove beneficial in binding to certain nucleicacid sequences. Even taking this into account, however, the generalstructure involving an α-helix co-ordinated by a zinc atom whichcontacts tour Cys or His residues, does not alter. As used herein,structures (A) and (B) above are taken as an exemplary structurerepresenting all zinc finger structures of the Cys2-His2 type.

Preferably, X^(a) is F/Y—X or P-F/Y—X. In this context, X is any aminoacid. Preferably, in this context X is E, K, T or S. Less preferred butalso envisaged are Q, V, A and P. The remaining amino acids remainpossible.

Preferably, X₂₋₄ consists of two amino acids rather than four. The firstof these amino acids may be any amino acid, but S, E, K, T, P and R arepreferred. Advantageously, it is P or R. The second of these amino acidsis preferably E, although any amino acid may be used.

Preferably, X^(b) is T or I.

Preferably, X^(c) is S or T.

Preferably, X₂₋₃ is G-K-A, G-K-C, G-K-S or G-K-G. However, departuresfrom the preferred residues are possible, for example in the form ofM-R-N or M-R.

Preferably, the linker is T-G-E-K (SEQ ID NO: 4) or T-G-E-K-P (SEQ IDNO: 5).

As set out above, the major binding interactions occur with amino acids−1,+2, +3 and +6. Amino acids +4 and +7 are largely invariant. Theremaining amino acids may be essentially any amino acids. Preferably,position +9 is occupied by Arg or Lys. Advantageously, positions +1, +5and +8 are not hydrophobic amino acids, that is to say are not Phe, Trpor Tyr.

In a most preferred aspect, therefore, bringing together the above, theinvention allows the definition of every residue in a zinc fingernucleic acid binding motif which will bind specifically to a givennucleic acid quadruplet.

The code provided by the present invention is not entirely rigid;certain choices are provided. For example, positions +1, +5 and +8 mayhave any amino acid allocation, whilst other positions may have certainoptions: for example, the present rules provide that, for binding to acentral T residue, any one of Ala, Ser or Val may be used at +3. In itsbroadest sense, therefore, the present invention provides a very largenumber of proteins which are capable of binding to every defined targetnucleic acid quadruplet.

Preferably, however, the number of possibilities may be significantlyreduced. For example, the non-critical residues +1, +5 and +8 may beoccupied by the residues Lys, Thr and Gln respectively as a defaultoption. In the case of the other choices, for example, the first-givenoption may be employed as a default. Thus, the code according to thepresent invention allows the design of a single, defined polypeptide (a“default” polypeptide) which will bind to its target quadruplet.

In a further aspect of the present invention, there is provided a methodfor preparing a nucleic acid binding protein of the Cys2-His2 zincfinger class capable of binding to a target nucleic acid sequence,comprising the steps of:

-   a) selecting a model zinc finger domain from the group consisting of    naturally occurring zinc fingers and consensus zinc fingers; and-   b) mutating one or more of positions −1, +2, +3 and +6 of the finger    as required according to the rules set forth above.

In general, naturally occurring zinc fingers may be selected from thosefingers for which the nucleic acid binding specificity is known. Forexample, these may be the fingers for which a crystal structure has beenresolved: namely Zif 268 (Elrod-Erickson et al., (1996) Structure4:1171-1180), GLI (Pavletich and Pabo, (1993) Science 261:1701-1707).Tramtrack (Fairall et al., (1993) Nature 366:483-487) and YY1 (Houbaviyet al., (1996) PNAS (USA) 93:13577-13582).

The naturally occurring zinc finger 2 in Zif 268 makes an excellentstarting point from which to engineer a zinc finger and is preferred.

Consensus zinc finger structures may be prepared by comparing thesequences of known zinc fingers, irrespective of whether their bindingdomain is known. Preferably, the consensus structure is selected fromthe group consisting of the consensus structure P Y K C P E C G K S F SQ K S D L V K H Q R T H T G (SEQ ID NO: 6), and the consensus structureP Y K C S E C G K A F S Q K S N L T R H Q R 1 H T G E K P (SEQ ID NO:7).

The consensuses are derived from the consensus provided by Krizek et al.(1991) J. Am.Chem. Soc. 113:4518-4523 and from Jacobs, (1993) PhDthesis, University of Cambridge, UK. In both cases, the linker sequencesdescribed above for joining two zinc finger motifs together, namely TGEK(SEQ ID NO: 4) or TGEKP (SEQ ID NO: 5) can be formed on the ends of theconsensus. Thus, a P may be removed where necessary, or, in the case ofthe consensus terminating T G, E K (P) can be added.

When the nucleic acid specificity of the model finger selected is known,the mutation of the finger in order to modify its specificity to bind tothe target nucleic acid may be directed to residues known to affectbinding to bases at which the natural and desired targets differ.Otherwise, mutation of the model fingers should be concentrated uponresidues −1, +2, +3 and +6 as provided for in the foregoing rules.

In order to produce a binding protein having improved binding, moreover,the rules provided by the present invention may be supplemented byphysical or virtual modelling of the protein/nucleic acid interface inorder to assist in residue selection.

Zinc finger bindings motifs designed according to the invention may becombined into nucleic acid binding proteins having a multiplicity ofzinc fingers. Preferably, the proteins have at least two zinc fingers.In nature, zinc finger binding proteins commonly have at least threezinc fingers, although two-zinc finger proteins such as Tramtrack areknown. The presence of at least three zinc fingers is preferred. Bindingproteins may be constructed by joining the required fingers end to end,N-terminus to C-terminus. Preferably, this is effected by joiningtogether the relevant nucleic acid coding sequences encoding the zincfingers to produce a composite coding sequence encoding the entirebinding protein. The invention therefore provides a method for producinga nucleic acid binding protein as defined above, wherein the nucleicacid binding protein is constructed by recombinant DNA technology, themethod comprising the steps of:

-   -   a) preparing a nucleic acid coding sequence encoding two or more        zinc finer binding motifs as defined above, placed N-terminus to        C-terminus;    -   b) inserting the nucleic acid sequence into a suitable        expression vector; and    -   c) expressing the nucleic acid sequence in a host organism in        order to obtain the nucleic acid binding protein.

A “leader” peptide may be added to the N-terminal finger. Preferably,the leader peptide is MAEEKP (SEQ ID NO: 8).

The nucleic acid encoding the nucleic acid binding protein according tothe invention can be incorporated into vectors for further manipulation.As used herein, vector (or plasmid) refers to discrete elements that areused to introduce heterologous nucleic acid into cells for eitherexpression or replication thereof. Selection and use of such vehiclesare well within the skill of the person of ordinary skill in the art.Many vectors are available, and selection of appropriate vector willdepend on the intended use of the vector, i.e. whether it is to be usedfor DNA amplification or for nucleic acid expression, the size of theDNA to be inserted into the vector, and the host cell to be transformedwith the vector. Each vector contains various components depending onits function (amplification of DNA or expression of DNA) and the hostcell for which it is compatible. The vector components generallyinclude, but are not limited to, one or more of the following: an originof replication, one or more marker genes, an enhancer element, apromoter, a transcription termination sequence and a signal sequence.

Both expression and cloning vectors generally contain nucleic acidsequence that enable the vector to replicate in one or more selectedhost cells. Typically in cloning vectors, this sequence is one thatenables the vector to replicate independently of the host chromosomalDNA, and includes origins of replication or autonomously replicatingsequences. Such sequences are well known for a variety of bacteria,yeast and viruses. The origin of replication from the plasmid pBR322 issuitable for most Gram-negative bacteria, the 2μ plasmid origin issuitable for yeast, and various viral origins (e.g. SV 40. polyoma,adenovirus) are useful for cloning vectors in mammalian cells.Generally, the origin of replication component is not needed formammalian expression vectors unless these are used in mammalian cellscompetent for high level DNA replication, such as COS cells.

Most expression vectors are shuttle vectors, i.e. they are capable ofreplication in at least one class of organisms but can be transfectedinto another class of organisms for expression. For example, a vector iscloned in E. coli and then the same vector is transfected into yeast ormammalian cells even though it is not capable of replicatingindependently of the host cell chromosome. DNA may also be replicated byinsertion into the host genome. However, the recovery of genomic DNAencoding the nucleic acid binding protein is more complex than that ofexogenously replicated vector because restriction enzyme digestion isrequired to excise nucleic acid binding protein DNA. DNA can beamplified by PCR and be directly transfected into the host cells withoutany replication component.

Advantageously, an expression and cloning vector may contain a selectiongene also referred to as selectable marker. This gene encodes a proteinnecessary for the survival or growth of transformed host cells grown ina selective culture medium. Host cells not transformed with the vectorcontaining the selection gene will not survive in the culture medium.Typical selection genes encode proteins that confer resistance toantibiotics and other toxins, e.g, ampicillin, neomycin, methotrexate ortetracycline, complement auxotrophic deficiencies, or supply criticalnutrients not available from complex media.

As to a selective gene marker appropriate for yeast, any marker gene canbe used which facilitates the selection for transformants due to thephenotypic expression of the marker gene. Suitable markers for yeastare, for example, those conferring resistance co antibiotics G418,hygromycin or bleomycin, or provide for prototrophy in an auxotrophicyeast mutant, for example the URA3, LEU2, LYS2, TRP1, or HIS3 gene.

Since the replication of vectors is conveniently done in E. coli, an E.coli genetic marker and an E. coli origin of replication areadvantageously included. These can be obtained from E. coli plasmids,such as pBR322, Bluescript© vector or a pUC plasmid, e.g, pUC18 orpUC19, which contain both E. coli replication origin and E. coli geneticmarker conferring resistance to antibiotics, such as ampicillin.

Suitable selectable markers for mammalian cells are those that enablethe identification of cells competent to take up nucleic acid bindingprotein nucleic acid, such as dihydrofolate reductase (DHFR,methotrexate resistance), thymidine kinase, or genes conferringresistance to G418 or hygromycin. The mammalian cell transformants areplaced under selection pressure which only those transformants whichhave taken up and are expressing the marker are uniquely adapted tosurvive. In the case of a DHFR or glutamine synthase (GS) marker,selection pressure can be imposed by culturing the transformants underconditions in which the pressure is progressively increased, therebyleading to amplification (at its chromosomal integration site) of boththe selection gene and the linked DNA that encodes the nucleic acidbinding protein. Amplification is the process by Which genes in greaterdemand for the production of a protein critical for growth, togetherwith closely associated genes which may encode a desired protein, arereiterated in tandem within the chromosomes of recombinant cells.Increased quantities of desired protein are usually synthesised fromthus amplified DNA.

Expression and cloning vectors usually contain a promoter that isrecognised by the host organism and is operably linked to nucleic acidbinding protein encoding nucleic acid. Such a promoter may be inducibleor constitutive. The promoters are operably linked to DNA encoding thenucleic acid binding protein by removing the promoter from the sourceDNA by restriction enzyme digestion and inserting the isolated promotersequence into the vector. Both the native nucleic acid binding proteinpromoter sequence and many heterologous promoters may be used to directamplification and/or expression of nucleic acid binding protein encodingDNA.

Promoters suitable for use with prokaryotic hosts include, for example,the β-lactamase and lactose promoter systems, alkaline phosphatase, thetryptophan (trp) promoter system and hybrid promoters such as the tacpromoter. Their nucleotide sequences have been published, therebyenabling the skilled worker operably to ligate them to DNA encodingnucleic acid binding protein, using linkers or adapters to supply anyrequired restriction sites. Promoters for use in bacterial systems willalso generally contain a Shine-Delgarno sequence operably linked to theDNA encoding the nucleic acid binding protein.

Preferred expression vectors are bacterial expression vectors whichcomprise a promoter of a bacteriophage such as phagex or T7 which iscapable of functioning in the bacteria. In one of the most widely usedexpression systems, the nucleic acid encoding the fusion protein may betranscribed from the vector by T7 RNA polymerase (Studier et al, Methodsin Enzymol. 185: 60-89, 1990). In the E. coli BL21(DE3) host strain,used in conjunction with pET vectors, the T7 RNA polymerase is producedfrom the λ-lysogen DE3 in the host bacterium, and its expression isunder the control of the IPTG inducible lac UV5 promoter. This systemhas been employed successfully for over-production of many proteins.Alternatively the polymerase gene may be introduced on a lambda phage byinfection with an int-phage such as the CE6 phage which is commerciallyavailable (Novagen, Madison, USA), other vectors include vectorscontaining the lambda PL promoter such as PLEX (Invitrogen. NL), vectorscontaining the trc promoters such as pTrcHisXpressTm (Invitrogen) orpTrc99 (Pharmacia Biotech. SE) or vectors containing the tac promotersuch as pKK223-3 (Pharmacia Biotech) or PMAL (New England Biolabs, MA.USA).

Moreover, the nucleic acid binding protein gene according to theinvention preferably includes a secretion sequence in order tofacilitate secretion of the polypeptide from bacterial hosts, such thatit will be produced as a soluble native peptide rather than in aninclusion body. The peptide may be recovered from the bacterialperiplasmic space, or the culture medium, as appropriate.

Suitable promoting sequences for use with yeast hosts may be regulatedor constitutive and are preferably derived from a highly expressed yeastgene, especially a Saccharomyces cerevisiae gene. Thus, the promoter ofthe TRP1 gene, the ADHI or ADHII gene, the acid phosphatase (PH05) gene,a promoter of the yeast mating pheromone genes coding for the a- orα-factor or a promoter derived from a gene encoding a glycolytic enzymesuch as the promoter of the enolase, glyceraldehyde-3-phosphatedehydrogenase (GAP), 3-phospho glycerate kinase (PGK), hexokinase,pyruvate decarboxylase, phosphofructokinase, 2glucose-6-phosphateisomerase, 3-phosphoglycerate mutase, pyruvate kinase, triose phosphateisomerase, phosphoglucose isomerase or glucokinase genes, or a promoterfrom the TATA binding protein (TBP) gene can be used. Furthermore, it ispossible to use hybrid promoters comprising upstream activationsequences (UAS) of one yeast gene and downstream promoter elementsincluding a functional TATA box of another yeast gene, for example ahybrid promoter including the UAS(s) of the yeast PH05 gene anddownstream promoter elements including a functional TATA box of theyeast GAP gene (PH05-GAP hybrid promoter). A suitable constitutive PHO5promoter is e.g, a shortened acid phosphatase PH05 promoter devoid ofthe upstream regulatory elements (UAS) such as the PH05 (−173) promoterelement starting at nucleotide −173 and ending at nucleotide −9 of thePH05 gene.

Nucleic acid binding protein gene transcription from vectors inmammalian hosts may be controlled by promoters derived from the genomesof viruses such as polyoma virus, adenovirus, fowlpox virus, bovinepapilloma virus, avian sarcoma virus, cytomegalovirus (CMV), aretrovirus and Simian Virus 40 (SV40), from heterologous mammalianpromoters such as the actin promoter or a very strong promoter, e.g, aribosomal protein promoter, and from the promoter normally associatedwith nucleic acid binding protein sequence, provided such promoters arecompatible with the host cell systems.

Transcription of a DNA encoding nucleic acid binding protein by highereukaryotes may be increased by inserting an enhancer sequence into thevector. Enhancers are relatively orientation and position independent.Many enhancer sequences are known from mammalian genes (e.g, elastaseand globin). However, typically one will employ an enhancer from aeukaryotic cell virus. Examples include the SV40 enhancer on the lateside of the replication origin (bp 100-270) and the CMV early promoterenhancer. The enhancer may be spliced into the vector at a position 5′or 3′ to nucleic acid binding protein DNA, but is preferably located ata site 5′ from the promoter.

Advantageously, a eukaryotic expression vector encoding a nucleic acidbinding protein according to the invention may comprise a locus controlregion (LCR). LCRs are capable of directing high-level integration siteindependent expression of transgenes integrated into host cellchromatin, which is of importance especially where the nucleic acidbinding protein gene is to be expressed in the context of apermanently-transfected eukaryotic cell line in which chromosomalintegration of the vector has occurred, or in transgenic animals.

Eukaryotic vectors may also contain sequences necessary for thetermination of transcription and for stabilising the mRNA. Suchsequences are commonly available from the 5′ and 3′ untranslated regionsof eukaryotic or viral DNAs or cDNAs. These regions contain nucleotidesegments transcribed as polyadenylated fragments in the untranslatedportion of the mRNA encoding nucleic acid binding protein.

An expression vector includes any vector capable of expressing nucleicacid binding protein nucleic acids that are operatively linked withregulatory sequences, such as promoter regions, that are capable ofexpression of such DNAs. Thus, an expression vector refers to arecombinant DNA or RNA construct, such as a plasmid, a phage,recombinant virus or other vector, that upon introduction into anappropriate host cell, results in expression of the cloned DNA.Appropriate expression vectors are well known to those with ordinaryskill in the art and include those that are replicable in eukaryoticand/or prokaryotic cells and those that remain episomal or those whichintegrate into the host cell genome. For example, DNAs encoding nucleicacid binding protein may be inserted into a vector suitable forexpression of cDNAs in mammalian cells, e.g, a CMV enhancer-based vectorsuch as pEVRF (Matthias, et al., (1989) NAR 17, 6418).

Particularly useful for practising the present invention are expressionvectors that provide for the transient expression of DNA encodingnucleic acid binding protein in mammalian cells. Transient expressionusually involves the use of an expression vector that is able toreplicate efficiently in a host cell, such that the host cellaccumulates many copies of the expression vector, and, in turn,synthesises high levels of nucleic acid binding protein. For thepurposes of the present invention, transient expression systems areuseful e.g, for identifying nucleic acid binding protein mutants, toidentify potential phosphorylation sites or to characterise functionaldomains of the protein.

Construction of vectors according to the invention employs conventionalligation techniques. Isolated plasmids or DNA fragments are cleaved,tailored, and religated in the form desired to generate the plasmidsrequired. If desired, analysis to confirm correct sequences in theconstructed plasmids is performed in a known fashion. Suitable methodsfor constructing expression vectors, preparing in vitro transcripts,introducing DNA into host cells, and performing analyses for assessingnucleic acid binding protein expression and function are known to thoseskilled in the art. Gene presence, amplification and/or expression maybe measured in a sample directly, for example, by conventional Southernblotting, Northern blotting to quantitate the transcription of mRNA, dotblotting (DNA or RNA analysis), or in situ hybridisation, using anappropriately labelled probe which may be based on a sequence providedherein. Those skilled in the art will readily envisage how these methodsmay be modified, if desired.

In accordance with another embodiment of the present invention, thereare provided cells containing the above-described nucleic acids. Suchhost cells such as prokaryote, yeast and higher eukaryote cells may beused for replicating DNA and producing the nucleic acid binding protein.Suitable prokaryotes include eubacteria, such as Gram-negative orGram-positive organisms, such as E. coli, e.g. E. coli K-12 strains.DH^(a and HB)101. or Bacilli. Further hosts suitable for the nucleicacid binding protein encoding vectors include eukaryotic microbes suchas filamentous fungi or yeast, e.g. Saccharomyces cerevisiae. Highereukaryotic cells include insect and vertebrate cells, particularlymammalian cells including human cells or nucleated cells from othermulicellular organisms. In recent years propagation of vertebrate cellsin culture (tissue culture) has become a routine procedure. Examples ofuseful mammalian host cell lines are epithelial or fibroblastic celllines such as Chinese hamster ovary (CHO) cells, NIH 3T3 cells, HeLacells or 293T cells. The host cells referred to in this disclosurecomprise cells in in vitro culture as well as cells that are within ahost animal.

DNA may be stably incorporated into cells or may be transientlyexpressed using methods known in the art. Stably transfected mammaliancells may be prepared by transfecting cells with an expression vectorhaving a selectable marker gene, and growing the transfected cells underconditions selective for cells expressing the marker gene. To preparetransient transfectants, mammalian cells are transfected with a reportergene to monitor transfection efficiency.

To produce such stably or transiently transfected cells, the cellsshould be transfected with a sufficient amount of the nucleic acidbinding protein-encoding nucleic acid to form the nucleic acid bindingprotein. The precise amounts of DNA encoding the nucleic acid bindingprotein may be empirically determined and optimised for a particularcell and assay.

Host cells are transfected or, preferably, transformed with theabove-captioned expression or cloning vectors of this invention andcultured in conventional nutrient media modified as appropriate forinducing promoters, selecting transformants, or amplifying the genesencoding the desired sequences. Heterologous DNA may be introduced intohost cells by any method known in the art, such as transfection with avector encoding a heterologous DNA by the calcium phosphatecoprecipitation technique or by electroporation. Numerous methods oftransfection are known to the skilled worker in the field. Successfultransfection is generally recognised when any indication of theoperation of this vector occurs in the host cell. Transformation isachieved using standard techniques appropriate to the particular hostcells used.

Incorporation of cloned DNA into a suitable expression vector,transfection of eukaryotic cells with a plasmid vector or a combinationof plasmid vectors, each encoding one or more distinct genes or withlinear DNA, and selection of transfected cells are well known in the art(see, e.g. Sambrook et al. (1989) Molecular Cloning: A LaboratoryManual. Second Edition. Cold Spring Harbor Laboratory Press).

Transfected or transformed cells are cultured using media and culturingmethods known in the art, preferably under conditions, whereby thenucleic acid binding protein encoded by the DNA is expressed. Thecomposition of suitable media is known to those in the art, so that theycan be readily prepared. Suitable culturing media are also commerciallyavailable.

In a further aspect, the invention also provides means by which thebinding of the protein designed according to the rules can be improvedby randomising the proteins and selecting for improved binding. In thisaspect, the present invention represents an improvement of the methodset forth in WO 96/06166. Thus, zinc finger molecules designed accordingto the invention may be subjected to limited randomisation andsubsequent selection, such as by phage display, in order to optimise thebinding characteristics of the molecule.

Preferably, therefore, the method according to the invention comprisesthe further steps of randomising the sequence of the zinc finger bindingmotifs at selected sites, screening the randomised molecules obtainedand selecting the molecules having the most advantageous properties.Generally, those molecules showing higher affinity and/or specificity ofthe target nucleic acid sequence are selected.

Mutagenesis and screening of target nucleic acid molecules may beachieved by any suitable means. Preferably, the mutagenesis is performedat the nucleic acid level, for example by synthesising novel genesencoding mutant proteins and expressing these to obtain a variety ofdifferent proteins. Alternatively, existing genes can be themselvesmutated, such by site-directed or random mutagenesis, in order to obtainthe desired mutant genes.

Mutations may be performed by any method known to those of skill in theart. Preferred, however, is site-directed mutagenesis of a nucleic acidsequence encoding the protein of interest. A number of methods forsite-directed mutagenesis are known in the art, from methods employingsingle-stranded phage such as M13 to PCR-based techniques (see “PCRProtocols: A guide to methods and applications”, M. A. Innis. D. H.Gelfand. J. J. Sninsky. T. J. White (eds.). Academic Press, New York,1990). Preferably, the commercially available Altered Site IIMutagenesis System (Promega) may be employed, according to thedirections given by the manufacturer.

Screening of the proteins produced by mutant genes is preferablyperformed by expressing the genes and assaying the binding ability ofthe protein product. A simple and advantageously rapid method by whichthis may be accomplished is by phage display, in which the mutantpolypeptides are expressed as fusion proteins with the coat proteins offilamentous bacteriophage, such as the minor coat protein pII ofbacteriophage m13 or gene III of bacteriophage Fd, and displayed on thecapsid of bacteriophage transformed with the mutant genes. The targetnucleic acid sequence is used as a probe to bind directly to the proteinon the phage surface and select the phage possessing advantageousmutants, by affinity purification. The phase are then amplified bypassage through a bacterial host, and subjected to further rounds ofselection and amplification in order to enrich the mutant pool for thedesired phage and eventually isolate the preferred clone(s). Detailedmethodology for phage display is known in the art and set forth, forexample, in U.S. Pat. No. 5,223,409; Choo and Klug. (1995) CurrentOpinions in Biotechnology 6:431-436; Smith. (1985) Science228:1315-1317; and McCafferty et al., (1990) Nature 48:552-554; allincorporated herein by reference. Vector systems and kits for phagedisplay are available commercially, for example from Pharmacia.

Randomisation of the zinc finger binding motifs produced according tothe invention is preferably directed to those residues where the codeprovided herein gives a choice of residues. For example, therefore,positions +1, +5 and +8 are advantageously randomised, whilst preferablyavoiding hydrophobic amino acids: positions involved in binding to thenucleic, acid, notably −1, +2, +3 and +6, may be randomised also,preferably within the choices provided by the rules of the presentinvention.

Preferably, therefore, the “default” protein produced according to therules provided by the invention can be improved by subjecting theprotein to one or more rounds of randomisation and selection within thespecified parameters.

nucleic acid binding proteins according to the invention may be employedin a wide variety of applications, including diagnostics and as researchtools. Advantageously, they may be employed as diagnostic tools foridentifying the presence of nucleic acid molecules in a complex mixture,nucleic acid binding molecules according to the invention candifferentiate single base pair changes in target nucleic acid molecules.

Accordingly, the invention provides a method for determining thepresence of a target nucleic acid molecule, comprising the steps of:

-   -   a) preparing a nucleic acid binding protein by the method set        forth above which is specific for the target nucleic acid        molecule;    -   b) exposing a test system comprising the target nucleic acid        molecule to the nucleic acid binding protein under conditions        which promote binding, and removing any nucleic acid binding        protein which remains unbound;    -   c) detecting the presence of the nucleic acid binding protein in        the test system.

In a preferred embodiment, the nucleic acid binding molecules of theinvention can be incorporated into an ELISA assay. For example, phagedisplaying the molecules of the invention can be used to detect thepresence of the target nucleic acid, and visualised using enzyme-linkedanti-phage antibodies.

Further improvements to the use of zinc finger phage for diagnosis canbe made, for example, by co-expressing a marker protein fused to theminor coat protein (gVIII) of bacteriophage. Since detection with ananti-phage antibody would then be obsolete, the time and cost of eachdiagnosis would be further reduced. Depending on the requirements,suitable markers for display might include the fluorescent proteins ( A.B. Cubitt, et al., (1995) Trends Biochem Sci. 20, 448-455; T. T. Yang,et al., (1996) Gene 173, 19-23), or an enzyme such as alkalinephosphatase which has been previously displayed on gIII ( J. McCafferty,R. H. Jackson, D. J. Chiswell, (1991) Protein Engineering 4, 955-961)Labelling different types of diagnostic phage with distinct markerswould allow multiplex screening of a single nucleic acid sample.Nevertheless, even in the absence of such refinements, the basic ELISAtechnique is reliable, fast, simple and particularly inexpensive.Moreover it requires no specialised apparatus, nor does it employhazardous reagents such as radioactive isotopes, making it amenable toroutine use in the clinic. The major advantage of the protocol is thatit obviates the requirement for gel electrophoresis, and so opens theway to automated nucleic acid diagnosis.

The invention provides nucleic acid binding proteins which can beengineered with exquisite specificity. The invention lends itself,therefore, to the design of any molecule of which specific nucleic acidbinding is required. For example, the proteins according to theinvention may be employed in the manufacture of chimeric restrictionenzymes, in which a nucleic acid cleaving domain is fused to a nucleicacid binding domain comprising a zinc finger as described herein.

The invention is described below, for the purpose of illustration only,in the following examples, with reference to the figures, in which:

FIG. 1 (SEQ ID NOS. 12, 13, 14, 9 & 15, in order of appearance)illustrates the design of a zinc finger binding protein specific for aG12V mutant ras oncogene;

FIG. 2 illustrates the binding specificity of the binding protein forthe oncogene as opposed to the wild-type ras sequence, and

FIG. 3 illustrates the results of an ELISA assay performed using theanti-ras binding protein with both wild-type and mutant target nucleicacid sequences;

FIG. 4 illustrates interactions between the Zif268 DNA-binding domainand DNA. (a) Schematic diagram of modular recognition between the threezinc fingers of Zif268 and triplet subsites of an optimised DNA bindingsite. Straight arrows indicate the stereochemical juxtapostioning ofrecognition residues with bases of the contacted G-rich DNA strand. Notethat since the N-terminal finger contacts the 3′ end of the DNA and theC-terminal finger the 5′ end, binding to the G-rich strand is said to beantiparallel. (b) View of Zif268 finger 3 bound to DNA, showing thepossibility of interaction with both DNA strands. Co-ordinates fromPavletich & Pabo, (1991) Science 252:809-817. (c) The potential hydrogenbonding network between bases on both strands of the DNA and positions 1(Arg) and 2 (Asp) of finger 3 (Pavletich & Pabo 1991). (d) Schematicdiagram of recognition between the three zinc fingers of Zif268 and anoptimised DNA binding site including ‘cross-strand’ interactions.Recognition contacts between Asp2 of each finger and the parallel DNAstrand (shown by curly arrows) mean that each finger binds overlapping,4 bp subsites:

FIG. 5 (SEQ ID NOS 16-23, in order of appearance) shows the amino acidsequences of the three finger constructs used in this study, includingwild-type Zif268 and four variants selected from a phage display libraryin which finger 2 is randomised. Boxed regions indicate the variedregions in each construct. The conserved zinc chelating residues of thezinc fingers are underlined. The aspartate in position 2 of finger 3 andthe alanine to which it is mutated in this study are both circled:

FIG. 6 (portions of SEQ ID NOS 19-23 in order of appearance) shows thebinding site signatures of the middle finger before and after alaninemutagenesis in position 2 of finger 3. The ELISA signal (A₄₅₀-A₆₅₀)showing interaction of zinc finger phage with each positionallyrandomised DNA library is plotted vertically. From the pattern ofbinding to these libraries, one or a small number of binding sites canbe read off and these are written on the right of the figure.Mutagenesis of position 2 in finger 3 can change the binding specificityfor the middle triplet of the Zif268 binding site. In such cases,chances are noted for base 5, but not bases 6 and 7 of the DNA bindingsite (see FIG. 4 a); and

FIG. 7 depicts the apparent equilibrium binding curves showing theeffect of replacing Asp2 in finger 3 by Ala for (a) Zif268 DNA-bindingdomain (consensus binding site used: 5′-GCG TGG GCG-3 ), and (b) F2-Argconstruct (consensus binding site used: 5′-GCG GTG GCG-3′). Wild-typeand mutant constructs are denoted by ‘wt’ and ‘mut’ respectively.

EXAMPLE 1 Construction of a Zinc Finger Protein

The target selected for the zinc finger nucleic acid binding protein isthe activating point mutation of the human EJ bladder carcinoma rasoncogene, which was the first DNA lesion reported to confer transformingproperties on a cellular proto-oncogene. Since the original discovery,ras gene mutations have been found to occur at high frequencies in avariety of human cancers and are established targets for the diagnosisof oncogenesis at early stages of tumour growth.

The EJ bladder carcinoma mutation is a single nucleotide chance in codon12 of H-ras, which results in a mutation from GGC to GTC at thisposition. A zinc finger peptide is designed to bind a 10 bp DNA siteassigned in the noncoding strand of the mutant ras gene, such that threefingers contact ‘anticodons’ 10, 11 and 12 in series, as shown in FIG.1, plus the 5′ preceding G (on the + strand of the DNA). The rationaleof this assignment takes into account the fact that zinc fingers makemost contacts to one DNA strand, and the mutant noncoding strand carriesan adenine which can be strongly discriminated from the cytosine presentin the wild-type ras, by a bidentate contact from an asparagine residue.

The first finger of the designer lead peptide is designed according tothe rules set forth herein starting from a Zif268 finger 2 model to bindthe quadruplet 5′-GCCG-3′, which corresponds to ‘anticodon’ 10 of thedesignated binding site plus one 3′ base. The finger has the followingsequence: (SEQ ID NO: 9) F Q C R 1 C M R N F S D R S S L T R H T R T H TG E K P −1 1 2 3 4 5 6 7 8 9

A DNA coding sequence encoding this polypeptide is constructed fromsynthesised oligonucleotides.

Given the similarity of the DNA subsites, the second and third fingersof the DNA-binding domain are direct repeats of this first finger but inwhich the third α-helical residue which contacts base 3 of a quadruplet,+3, is mutated according to recognition rules, to histidine in finger 2and asparagine in finger 3, such that the specificity of these fingersis predicted to be 5′-GGCG-3′ (includes ‘anticodon’ 11) and 5′-GACG-3′(includes ‘anticodon’ 12) respectively. Thus, the second and thirdfinger polypeptides have the sequences (SEQ ID NO: 10) F Q C R I C M R NF S D R S H L T R H T R T H T G E K P and (SEQ ID NO: 11) F Q C R I C MR N F S D R S N L T R H T R T H T G E K respectively.

A construct consisting of DNA sequences encoding the three fingersjoined together, preceded by a leader MAEEKP at the N-terminus, iscloned as a fusion to the minor coat protein (gene III) of bacteriophageFd in the phage vector Fd-Tet-SN ( Y. Choo, A. Klug, (1994) Proc. Natl.Acad. Sci. U.S.A. 91, 11163-11167). In phage display screening, theDNA-binding domain is able to bind the mutated ras sequence with anapparent K_(d) of 17 nM, and to discriminate strongly against thewild-type sequence.

EXAMPLE 2 Improvement of Binding Performance by Selective Randomisation

While a K_(d) of 17 nM is sufficient for most practical applications ofDNA-binding proteins, the apparent affinity of the designed proteinfalls about 5-fold short of the K_(d)s in the nanomolar range which arefound for the reaction of wild-type zinc finger proteins with theirnatural binding sites ( Y. Choo. A. Klug, (1994) Proc. Natl. Acad. Sci.U.S.A. 91. 11168-11172).

According to the recognition rules, the first finger of the lead peptidecould contact cytosine using one of Asp, Glu, Ser or Thr in the thirdα-helix position. To determine the optimal contact, the codon forhelical position 3 of finger 1 is engineered by cassette mutagenesis tohave position 1=A/G, position 2=A/C/G and position 3=C/G. Therefore inaddition to Asp, Glu, Ser and Thr, the randomisation also specifies Ala,Arg, Asn, Gly and Lys. Selections from this mini-library are over oneround of phage bindings to 5 nM mutant DNA oligo in 100 μl PBScontaining 5 μM ZnCl₂. 2/% (w/v) fat-free dried milk (Marvel) and 1%(v/v) Tween-20. with 1 μg poly dIdC as competitor, followed by sixWashes with PBS containing 50 μM ZnCl₂ and 1% (v/v) Tween-20. Boundphage are eluted with 0.1M triethylamine for 3 mins, and immediatelytransferred to an equal volume of 1M Tris-Cl pH 7.4.

A single round of randomisation and selection is found to be sufficientto improve the affinity of the lead zinc finger peptide to thisstandard. A small library of mutants is constructed with limitedvariations specifically in the third α-helical position (+3) of finger 1of the designed peptide. Selection from this library yields an optimisedDNA-binding domain with asparagine at the variable position, which isable to bind the mutant ras sequence with an apparent K_(d) of 3 nM,i.e. equal to that of the wild-type Zif268 DNA-binding domain (FIG. 2).The selection of asparagine at this position to bind opposite a cytosineis an unexpected deviation from the recognition rules, which normallypair asparagine with adenine.

The selection of asparagine is, however, consistent with physicalconsiderations of the protein-DNA interface. In addition to theclassical bidentate interaction of asparagine and adenine observed inzinc finger-DNA complexes, asparagine has been observed to bridge abase-pair step in the major groove of DNA, for example in the co-crystalstructures of the GCN4 DNA-binding domain. A number of differentbase-pair steps provide the correct stereochemical pairings of hydrogenbond donors and acceptors which could satisfy asparagine, including theunderlined step GCC of ras ‘anticodon’ 10. Although asparagine inposition 3 of the zinc finger helix would not normally be positioned tobridge a base-pair step according to the Zif268 model, it is known thata bend in DNA can give scope to non-canonical zinc finger-DNAinteractions ( L. Fairall, J. W. R. Schwabe. L. Chapman. J. T. Finch, D.Rhodes. (1993) Nature 366, 483487). The sequence GGC (codon 10) isfrequently found on the outside of a bend in the nucleosome core, andhas been observed to confer an intrinsic bend in the crystal structureof a decameric DNA oligonucleotide. In the latter case, the bend arisesfrom preferential stacking of the purines: this is associated with alarge propeller twist and narrowing of the major groove, both of whichwould favour bridging of the base-pair step by asparagine ( T. E.Ellenberger, C. J. Brandl. K. Struhl. S. C. Harrison, (1992) Cell 71.1223-1237). Therefore, in addition to explaining the selection of thenon-canonica) contact in the optimised complex, the sequence-dependentdeformation of ras DNA could account for our observation that wild-typeand EJ ras gene fragments have different electrophoretic mobility inpolyacrylamide gels, since the wild-type ras gene has two GGC sequences5 bp apart and hence out of helical phase (resulting in no net bend),while the EJ mutation affects one of these GGC sequences.

Thus, while it is possible to engineer an adequate DNA-binding domain byrational design based on recognition rules, the binding affinity of thislead peptide is improved using phage display leading to the selection ofa non-canonical DNA contact.

EXAMPLE 3 Diagnosis of a Ras Mutation Using the Zinc Finger Nucleic AcidBinding Protein

The optimised DNA-binding domain displayed on phase is applied in thediagnosis of the activating point mutation of the EJ ras oncogene.Bacterial culture supernatant containing the diagnostic phase is diluted1:1 with PBS containing 50 μM ZnCl₂.4% (w/v) fat-free dried milk(Marvel) and 2% (v/v) Tween-20. Biotinylated oligonucleotides (7.5 pmol)containing double stranded DNA comprising codons 8-16 from the wild typeor the point-mutated ras gene are added to 50 μl of the diluted phageand incubated for 1 h at 20° C. In the experiment shown in FIG. 3, boundphage are captured with 0.5 mg streptavidin coated paramagnetic beads(Dynal)—however streptavidin coated microtitre plates (BoehringerMannheim) can also be used without alteration to the protocol. Unboundphage are removed by washing the beads 6 times with PBS containing 50 μMZnCl₂ and 1% (v/v) Tween-20. The beads are subsequently incubated for 1h at RT with anti-M13 IgG conjugated to horseradish peroxidase(Pharmacia Biotech) diluted 1:5000 in PBS containing 50 μM ZnCl₂ and 2%(w/v) fat-free dried milk (Marvel). Excess antibody is removed bywashing 6 times with PBS containing 50 μM ZnCl₂ and 0.05% (v/v) Tween,and 3 times with PBS containing 50 μM ZnCl_(2.) The ELISA is developedWith 0.1 mg/ml tetramethylbenzidine (Sigma) in 0.1M sodium acetate pH5.4containing 2 μl of fresh 30% hydrogen peroxide per 10 ml buffer, andafter approximately 1 min, stopped with an equal volume of 2M H₂SO_(4.)The reaction produces a yellow colour which is quantitated bysubtracting the absorbance at 650 nm from the absorbance at 450 nm. Itshould be noted that in this protocol the ELISA is not made competitive,however, soluble (non biotinylated) wild-type ras DNA could be includedin the binding reactions, possibly leading to higher discriminationbetween wild-type and mutant ras.

Phage are retained specifically by DNA bearing the mutant, but not thewild-type ras sequence, allowing the detection of the point mutation byELISA (FIG. 3).

EXAMPLE 4 Design of an anti-HIV Zinc Finger

The sequence of the HIV TAR, the region of the LTR which is responsiblefor trans-activation by Tat, is known (Jones and Peterlin, (1994) Ann.Rev. Biochem. 63:717-743). A sequence with the TAT region is identifiedand a zinc finger polypeptide designed to bind thereto.

The selected sequence is 5′-AGA GAG CTC-3′, which is the complement ofnucleotides +34 to +42 of HIV. The corresponding amino acids required infingers 1, 2 and 3 of a zinc finger binding protein are determinedaccording to the rules set forth above, as follows:

Finger 3: target 5′ - AGA - 3′ Position −1 Gln Position +2 Gly Position+3 His Position +6 Val Finger 2: target 5′ - GAG - 3′ Position −1 ArgPosition +2 Ser Position +3 Asn Position +6 Arg Finger 1: target 5′ -CTC - 3′ Position −1 Asp Position +3 Ser Position +6 Glu

The framework of the polypeptide is taken from the Zif 268 middlefinger. The sequence of the entire polypeptide is shown in SEQ. ID. No.2.

Residues +2 and +6 of finger 3 are partially selected by randomisationand phase display selection. At position 2, two triplets are used, GATand GGT, coding for Asp or Gly. Position +6 was randomised. In thesepositions, the residues Gly and Val are selected. The methodologyemployed is as follows: colony PCR is performed with one primercontaining a single mismatch to create the required randomisations infinger 3. Cloning of PCR product in phage vector is as describedpreviously (Choo, Y. & Klug, A. (1994) Proc. Natl. Acad. Sci. USA 91,11163-11167; Choo, Y. & Klug, A. (1994) Proc. Natl. Acad. Sci. USA 91.11168-11172). Briefly, forward and backward PCR primers contained uniquerestriction sites for Not I or Sfi I respectively and amplified anapproximately 300 base pair region encompassing three zinc fingers. PCRproducts are digested with Sfi I and Not I to create cohesive ends andare ligated to 100 ng of similarly digested fd-Tet-SN vector.Electrocompetent TG1 cells are transformed with the recombinant vector.Single colonies of tranformants are grown overnight in 2×TY containing50 μM ZnCl₂. 15 μg/ml tetracycline. Single stranded DNA is prepared fromphage in the culture supernatant and sequenced with Sequenase 2.0(United States Biochemical).

The polypeptide designed according to the invention is then tested forbinding to HIV DNA and positive results are obtained.

EXAMPLE 5

Alanine mutagenesis of the Asp2 in finger 3 is carried out on thewild-type Zif268 DNA-binding domain and four related peptides isolatedfrom the phage display library as follows (see also FIG. 5):

E. coli TG1 cells are tranfected with fd phage displaying zinc fingers.Colony PCR is performed with one primer containing a single mismatch tocreate the Asp to Ala change in finger 3. Cloning of PCR product inphage vector is as described previously (Choo, Y. & Klug, A. (1994)Proc. Natl. Acad. Sci. USA 91, 11163-11167; Choo, Y. & Klug, A. (1994)Proc. Natl. Acad. Sci. USA 91, 11168-11172). Briefly, forward andbackward PCR primers contained unique restriction sites for Not I or SfiI respectively and amplified an approximately 300 base pair regionencompassing three zinc fingers. PCR products are digested with Sfi Iand Not I to create cohesive ends and are ligated to 100 ng of similarlydigested fd-Tet-SN vector. Electrocompetent TG1 cells are transformedwith the recombinant vector. Single colonies of tranformants are grownovernight in 2×TY containing 50 μM ZnCl₂15 μg/ml tetracycline. Singlestranded DNA is prepared from phage in the culture supernatant andsequenced with Sequenase 2.0 (United States Biochemical).

The peptides are chosen for this experiment on the basis of the identityof the residue at position 6 of the middle finger. Peptide F2-Arg, whichcontains Arg at position 6 of finger 2, is chosen since it shouldspecify 5′-G in the ‘middle’ cognate triplet regardless of the mutation.On the other hand, the peptide F2-Gly with Gly at position 6 would beexpected to lose all specificity at the 5′ position of the ‘middle’triplet following alanine mutagenesis in finger 3. The other twopeptides analysed, F2-Val and F2-Asn, with Val and Asn at position 6respectively, are chosen because these particular residues might confersome alternative binding specificity after the constraint imposed byposition 2 in finger 3 is removed by alanine mutagenesis (Seeman, N. D.,Rosenberg. J. M. & Rich. A. (1976) Proc, Nat. Acad. Sci. USA 73,804-808; Suzuki. M (19945 Structure 2, 317-326).

The DNA binding specificity of each middle finger is assessed before andafter the alanine mutation in finger 3 by the ‘binding site signature’method (Choo and Ku. 1994). This procedure involves screening each zincfinger phage for binding to 12 DNA libraries, each based on the DNAbinding site of Zif268 but containing one fixed and two randomisednucleotide positions in the ‘middle’ triplet. Each of the possible 64‘middle’ triplets is present in a unique combination of three of thesepositionally randomised libraries, for example the triplet GAT would befound in the GNN, NAN and NNT libraries only. Hence the pattern ofbinding to these reveals the sequence-specificity of the middle finger.

The detailed procedure is as described previously (Choo and Kug, 1994).Briefly, 5′-biotinylated positionally randomised oligonucleotidelibraries, containing Zif268 operator variants, are synthesised byprimer extension as described. DNA libraries (2 pmol/well) are added tostreptavidin-coated ELISA wells (Boehringer-Mannheim) in PBS containing50 μM ZnCl₂ (PBS/Zn). Phage solution (overnight bacteria/phage culturesupernatant solutions diluted 1:1 in PBS/Zn containing 4% Marvel, 2%Tween and 20 μg/ml sonicated salmon sperm DNA) are applied to each well(50 μl/well). Binding is allowed to proceed for one hour at 20° C.Unbound phage are removed by washing 6 times with PBS/Zn containing 1%Tween, then washing 3 times with PBS/Zn. Bound phage are detected byELISA with horseradish peroxidase-conjugated anti-M13 IgG (PharmaciaBiotech) and quantitated using SOFTMAX 2.32 (Molecular Devices).

FIG. 6 shows that deleting Asp2 from finger 3 generally alters thepattern of acceptable bases, in the ‘middle’ triplet, which isconventionally regarded as the binding site for finger 2. As would beexpected, according to the hypothesis set out in the introduction, themutation affects binding at the 5′ position, while the specificity atthe middle and 3′ position remains unchanged.

The mutation generally leads to a broadening of specificity, forinstance in Zif268 where removal of Asp2 in finger 3 results in aprotein which is unable to discriminate the 5′ base of the middletriplet (FIG. 6 a). However, the expectation that a new 5′base-specificity for the mutants might correlate to the identity ofposition 6 in finger 2, is not borne out. For example F2-Gly would beexpected to lose sequence discrimination but, although specificity isadversely affected, a slight preference for T is discernible (FIG. 6 b).Similarly. F2-Val and F2-Asn which might have been expected to acquirespecificity for one nucleotide, instead have their specificities alteredby the mutation (FIGS. 6 c, d)—the F2-Val mutant allows G, A and T butnot C, and the F2-Asn mutant appears to discriminate against bothpyrimidines. In the absence of a larger database it is not possible todeduce whether these apparent specificities are the result of aminoacid-base contacts from position 6 of finger 2, and if so whether theseare general interactions which should be regarded as recognition rules.The apparent discrimination of F2-Gly in particular, suggests that thisis unlikely to be the case, but rather that in these particularexamples, other mechanisms are involved in determining sequence bias.

In contrast to the loss of discrimination seen for the other fourpeptides. F2-Ara continues to specify guanine in the 5′ position of themiddle triplet regardless of the mutation in finger 3 (FIG. 3 e). Inthis case, the specificity is derived from the strong interactionbetween (Yuanine and Arg6 in finder 2. This contact has been observed anumber of times in zinc finger co-crystal structures (Pavletich, N. P. &Pabo, C. O. (1993) Science 261. 1701-1707; Fairall. L. Schwabe, J. W.R., Chapman, L., Finch, J. T. & Rhodes, D. (1993) Nature (London) 366,483-487; Fairall, L., Schwabe, J. W. R., Chapman, L., Finch, J. T. &Rhodes. D. (1993) Nature (London) 366, 483-487; Kim, C. & Berg, J. M.(1996) Nature Str. Biol. 3. 940-945) and is the only recognition rulewhich relates amino acid identity at position 6 to a nucleotidepreference at the 5′ position of a cognate triplet (Choo, Y & Klug A.(1997) Curr. Opin. Str. Biol. 7. 117-125). This interaction iscompatible with, but not dependent on, a contact to the same base-pairfrom Asp2 of the following finger (FIG. 7 c). Recognition of thisbase-pair can thus be synergistic, faith the specificity potentiallyderiving from contacts contributed by two adjacent fingers.

This finding explains the restricted sequence specificity of fingersselected from phage display libraries based on Zif268 (Choo and Klug.1994) and may also account for the failure to select zinc finger phagewhich bind to triplets with a 5′ cytosine or adenine (Rebar. E. J. &Pabo, C. O. (1994) Science 263, 671-673; Jamieson. A. C., Kim. S.-H. &Wells, J. A. (1994) Biochemistry 33. 5689-5695). FIG. 6 shows that Asp2of Zif268 stinger 3 specifically excludes adenine and cytosine from the5′ position of the middle triplet. When this interaction is deleted, oneor both of these bases become acceptable.

Preliminary modelling studies suggest that a number of amino acidresidues other than aspartate may be able to make contacts to theparallel DNA strand. For instance histidine in position 2 might make across-strand contact to G or T while maintaining the buttress to Arg-1.Interestingly, phage selections from randomised C-terminal fingerlibraries have yelded several fingers with His2, and Leu or Ser atposition 1 which may also influence the binding specificity (Greisman,H. A. & Pabo, C. O. (1997) Science 275, 657-661). The crystal structuresof zinc finger-DNA complexes show that Ser2 is also capable of ananalogous contact to the parallel DNA strand Pavletich, et al., 1993:Kim et al., 1996). Since serine is present in about 60% of all zincfingers (Jacobs, G. (1993) Ph.D, thesis, Cambridge Univ., Cambridge.U.K.) and can act as a donor or acceptor of a hydrogen bond, it would besurprising if this amino acid at position 2 are generally capable ofcontributing to the binding specificity. Rather, this contact probablystabilises the protein-DNA complex, and will be a useful device in thedesign of zinc finger proteins with high affinity for DNA (Choo et al.,1997). It should also be noted that Ser at position 2 has been observedin the Tramtrack structure to contact the 3′ base of a triplet in theantiparallel DNA strand, although this requires a deformation of the DNA(Fairall et al., 1993).

To determine the contribution of Asp2 in finger 3 to the bindingstrength, apparent equilibrium dissociation constants are determined forZif268 and FI-Arg before and after the Ala mutation (FIG. 7). Proceduresare as described previously (Choo and Klug. 1994). Briefly, appropriateconcentrations of 5′-biotinylated DNA binding sites are added to equalvolumes of phage solution described above. Binding is allowed to proceedfor one hour at 20° C. DNA is captured with streptavidin-coatedparamagnetic beads (500 μg/well). The beads are washed 6 times withPBS/Zn containing 1% Tween, then 3 times with PBS/Zn. Bound phage aredetected by ELISA with horseradish peroxidase-conjugated anti-M13 IgG(Pharmacia Biotech) and quantitated using SOFTMAX 2.32 (MolecularDevices). Binding data are plotted and analysed using Kaleidagraph(Abelbeck Software).

Both mutants show approximately a four-fold reduction in affinity fortheir respective binding sites under the conditions used. The reductionis likely a direct result of abolishing contacts from Asp2, rather thana consequence of changes in binding specificity at the 5′ position ofthe middle triplet, since the mutant Zif268 loses all specificity whileF2-Arg registers no change in specificity. However, note that twostabilising interactions are abolished: an intramolecular buttressinginteraction with Arg-1 on finger 3 and also the intermolecular contactwith the secondary DNA strand. An independent comparison of wild-typeZif268 binding to its consensus binding site flanked by G/T or A/C alsofound a five-fold reduction in affinity for those sites which are unableto satisfy a contact from Asp2 to the secondary DNA strand (Smirnoff, A.H. & Milbrandt. J. (1995) Mol. Cel. Biol. 15. 2275-2287). While theeffects of perturbations in the DNA structure cannot be discounted inthis case, the results of both experiments would seem to suggest thatthe reduction in binding affinity results from loss of the protein-DNAcontact. Nevertheless, the intramolecular contact between positions −1and 2 in a zinc finger, is a further level of synergy which may have tobe taken into account before the full picture emerges describing thepossible networks of contacts which occur at the protein-DNA interfacein the region of the overlapping subsites.

1. A method for preparing a nucleic acid binding protein that binds to atarget nucleotide sequence, wherein the binding protein comprises aplurality of zinc fingers of the Cys2-His2 class, wherein the methodcomprises: i) selecting a quadruplet within the target nucleotidesequence; ii) designing the binding protein such that binding of a zincfinger to the quadruplet is obtained by choosing the sequence ofparticular residues of the zinc finger depending on the nucleotidesequence of the quadruplet, as follows: a) if base 4 in the quadrupletis A, then position +6 in the α-helix is Gln and position ++2 is notAsp; b) if base 4 in the quadruplet is C, then position +6 in theα-helix may be any residue, as long as position ++2 in the α-helix isnot Asp; iii) synthesizing a polynucleotide encoding the binding proteinof (ii); iv) introducing the polynucleotide of (iii) into a cell; and v)incubating the cell under conditions in which the encoded nucleic acidbinding protein is expressed.
 2. A method according to claim 1, whereinbinding to base 4 of the quadruplet by a zinc finger is additionallydetermined as follows: c) if base 4 in the quadruplet is G, thenposition +6 in the α-helix is Arg, or position +6 is Ser or Thr andposition ++2 is Asp; d) if base 4 in the quadruplet is T, then position+6 in the α-helix is Ser or Thr and position ++2 is Asp.
 3. A method forpreparing a nucleic acid binding protein that binds to a targetnucleotide sequence, wherein the binding protein comprises a pluralityof zinc fingers of the Cys2-His2 class, wherein the method comprises: i)selecting a quadruplet within the target nucleotide sequence; ii)designing the binding protein such that binding of a zinc finger to thequadruplet is obtained by choosing the sequence of particular residuesof the zinc finger depending on the nucleotide sequence of thequadruplet, as follows: a) if base 4 in the quadruplet is G, thenposition +6 in the α-helix is Arg; or position +6 is Ser or Thr andposition ++2 is Asp; b) if base 4 in the quadruplet is A, then position+6 in the α-helix is Gln and position ++2 is not Asp; c) if base 4 inthe quadruplet is T, then position +6 in the α-helix is Ser or Thr andposition ++2 is Asp; d) if base 4 in the quadruplet is C, then position+6 in the α-helix may be any amino acid, provided that position ++2 inthe α-helix is not Asp; e) if base 3 in the quadruplet is G, thenposition +3 in the α-helix is His: f) if base 3 in the quadruplet is A,then position +3 in the α-helix is Asn; g) if base 3 in the quadrupletis T, then position +3 in the α-helix is Ala, Ser or Val; provided thatif it is Ala, then the residues at −1 or +6 small residues; h) if base 3in the quadruplet is C, then position +3 in the α-helix is Ser, Asp,Glu, Leu, Thr or Val; i) if base 2 in the quadruplet is G, then position−1 in the α-helix is Arg; j) if base 2 in the quadruplet is A, thenposition −1 in the α-helix is Gln; k) if base 2 in the quadruplet is T,then position −1 in the α-helix is Asn or Gln; l) if base 2 in thequadruplet is C, then position −1 in the α-helix is asp; m) if base 1 inthe quadruplet is G, then position +2 is asp; n) if base 1 in thequadruplet is A, then position +2 is not Asp; o) if base 1 in thequadruplet is C, then position +2 is not Asp; p) if base 1 in thequadruplet is T, then position +2 is Ser or Thr; iii) synthesizing apolynucleotide encoding the binding protein of (ii); iv) introducing thepolynucleotide of (iii) into a cell; and v) incubating the cell underconditions in which the encoded nucleic acid binding protein isexpressed.
 4. A method any one of claims 1-3 wherein each zinc fingerhas the general primary structure X^(a) Cys X₂₋₄Cys-X₂₋₃-Phe-X^(c)-X-X-X-X-Leu-X-X-His-X-X-X^(b)His-linker SEQ ID NO: 3)−1 1 2 3 4 5 6 7 8 9 wherein X (including X^(a), X^(b) and X^(c)) is anyamino acid.
 5. A method according to claim 4 wherein Xa is Phe/Tyr-X orPro-Phe/Tyr-X.
 6. A method according to claim 5 wherein X₂₋₄ is selectedfrom any one of: Ser-X, Glu-X, Lys-X, Thr-X, Pro-X and Arg-X.
 7. Amethod according to claim 4 wherein X^(b) is Thr or Ile.
 8. A methodaccording to claim 4 wherein X²⁻⁴ is Gly-Lys-Ala, Gly-Lys-Cys,Gly-Lys-Ser, Gly-Lys-Gly, Met-Arg-Asn or Met-Arg.
 9. A method accordingto claim 4 wherein the linker is (SEQ ID NO: 4) Thr-Gly-Glu-Lys or (SEQID NO: 5) Thr-Gly-Glu-Lys-Pro.
 10. A method according to claim 4 whereinposition +9 is Arg or Lys.
 11. A method according to claim 4 whereinpositions +1, +5 and +8 are not occupied by any one of the hydrophobicamino acids Phe, Trp or Tyr.
 12. A method according to claim 11 whereinpositions +1, +5 and +8 are occupied by the residues Lys, Thr and Glnrespectively.
 13. A method for preparing a nucleic acid binding proteinof the Cys2-His2 zinc finger class which binds a target nucleic acidsequence, comprising the steps of: a) selecting a model zinc fingerdomain from the group consisting of naturally occurring zinc fingers andconsensus zinc fingers; and b) mutating the finger according to therules set in any one of claims 1 to
 3. 14. A method according to claim13, wherein the model zinc finger is a consensus zinc finger whosestructure is selected from the group consisting of the consensusstructurePro-Tyr-Lys-Cys-Pro-Glu-Cys-Gly-Lys-Ser-Phe-Ser-Gln-Lys-Ser-Asp-Leu-Val-Lys-His-Gln-Arg-Thr-His-Thr-Gly(SEQ ID NO: 6), and the consensus structurePro-Tyr-Lys-Cys-Ser-Glu-Cys-Gly-Lys-Ala-Phe-Ser-Gln-Lys-Ser-Asn;Leu-Thr-Arg-His-Gln-Arg-Ile-His-Thr-Gly-Glu-Lys-Pro (SEQ ID NO: 7). 15.A method according to claim 13, wherein the model zinc finger is anaturally-occurring zinc finger whose structure is selected from onefinger of a protein selected from the group consisting of Zif268, GLI,Tramtrack and YY1.
 16. A method according to claim 15 wherein the modelzinc finger is finger 2 of Zif
 268. 17. A method according to claim 13wherein the binding protein comprises two or more zinc finger bindingmotifs, placed N-terminus to C-terminus.
 18. A method according to claim14, wherein the N-terminal zinc finger is preceded by a leader peptidehaving the sequence Met-Ala-Glu-Glu-Lys-Pro SEQ ID NO: 8).
 19. A methodaccording to claim 13 wherein the nucleic acid binding protein isobtained by recombinant nucleic acid technology, the method comprisingthe steps of: a) preparing a nucleic acid coding sequence encoding twoor more model zinc finger domains, placed N-terminus to C-terminus; b)inserting the nucleic acid sequence into a suitable expression vector;and c) expressing the nucleic acid sequence in a host organism in orderto obtain the nucleic acid binding protein.
 20. A method according, toclaim 19 comprising the additional steps of subjecting the nucleic acidbinding protein to one or more rounds of randomisation and selection.21. A method according to claim 20, wherein the randomisation andselection is carried out by phage display technology.
 22. A methodaccording to claim 21, further comprising the steps of: a) preparing anucleic acid construct which expresses a fusion protein comprising thenucleic acid binding protein and a minor coat protein of a filamentousbacteriophage; b) preparing further nucleic acid constructs whichexpress a fusion protein comprising a selectively mutated nucleic acidbinding protein and a minor coat protein of a filamentous bacteriophage;c) causing the fusion proteins defined in steps (a) and (b) to beexpressed on the surface of bacteriophage transformed with the nucleicacid constructs; and d) assaying the ability of the bacteriophage tobind the target nucleic acid sequence and selecting the bacteriophagedemonstrating superior binding characteristics.
 23. A method accordingto claim 22 wherein the nucleic acid binding protein is selectivelyrandomised at any one of positions +1, +5, +8, −1, +2, +3 or +6.
 24. Themethod of claim 3, wherein a plurality of overlapping quadruplets areselected within the target sequence.